V2 Validating mousetracking

May 01 Experiments

First mouse-tracking experiment with transitions and rewards revealed on hover. Simple behavioral analyses suggest that the hovers provide some signal about planning, but not in exactly the way we expected.

Methods¶

full experiment demo

take N steps in an undirected graph to maximize reward
reward and transition functions change every trial
hover over a state to reveal both its reward and connected states

Procedure¶

introduce graph and how to move
collect each of the four possible rewards {-10, -5, 5, 10} once
ternary choice trials to check image/reward learning
- each block has two trials where each possible reward (besides the very worst) should be chosen (six trials total)
- the choice always include the next best reward, e.g. for a 5-best-trial, the choice set would be {-10, -5, 5} or {-10, 5, 5}
- repeat with randomly generated choice sets until perfect performance, or fail out of experiment after 5 blocks
introduce multi-step
check that they understand they can go back to a previously visited state
practice trials with fully revealed graph
start varying transition function from trial to trial
introduce hidden rewards/transitions and hovering to reveal
three full practice rounds
32 main trials, which are analyzed

Results¶

Example trials¶

This is showing the 30th trial from each participant, at approximately real-time speed. You can click on the gifs to expand and watch from the beginning. You can view all trials ✨here✨.

We begin by addressing the two main planned analyses from the grant. Then we consider the data from a broader perspective.

N.B.

I’m using “fixation” to refer to a period of time when a person is hovering over one of the states (similar for “fixating” a state)
P01 is participant 1; P01-T01 is the first (non-practice) trial for participant 1

Are decisions sensitive to search?¶

In the current draft of the grant, we predict: (1A) People’s fixations should be predictive of the choices they make. Does this hold in the mouse hovering data?

tl;dr

Search is certainly predictive of choice, but it’s not clear how it affects choice; the relationship is not as we predicted. The bidirectional causual relationship between consideration and intention complicates things.

Probability of visting a state by interaction of reward and fixation¶

Intuitively, people’s decisions should be more influenced by rewards associated with fixated (vs. un-fixated) states. However, If we take this literally, it is necessarily true because you can’t possibly visit a state without first hovering over it. Thus, we exclude “fixations” that immediately preceed a click. That is, we predict visiting a state based on whether the state was previously considered.

Logistic regression: visited ~ reward * previously_considered

Term	Est.	S.E.	z val.	p
reward	2.077	0.450	4.618	p < .001
previously_considered	2.303	0.503	4.578	p < .001
reward:previously_considered	-0.352	0.462	-0.763	p = .445

We see a strong main effect of fixation on path choice: people are more likely to visit states that they have looked at previously. However, there is no interaction with reward. You’re more likely to visit a state that you considered previously regardless of its reward.

Interaction of reward and proportion fixation time¶

The above result could potentially be explained by people’s tendency to fixate every state. What if we use a continuous measure of attention? Looking at the proportion of fixation time, we see an even stronger version of the same thing. States that receive a lot of attention are likely to be visited regardless of their reward.

Logistic regression: visited ~ reward * prop_fixated

Term	Est.	S.E.	z val.	p
reward	1.732	0.351	4.939	p < .001
prop_fixated	31.033	2.290	13.549	p < .001
reward:prop_fixated	-1.387	2.306	-0.602	p = .547

Possible explanation¶

What’s going on here? One possible explanation is that people consider courses of action that they intend to take. That is, there is a bidirectional relationship between the current plan and the search process. This prevents us from getting a clean measurement of the effect of search on choice. It seems that the tendency to consider paths that you already intend to take is so strong that this washes out the other causal direction (consideration × value → intention).

flowchart LR
Value --> Consideration
Consideration --> Intention
Intention --> Consideration
Value --> Intention
Intention --> Choice

Is search directed towards rewarding states?¶

tl;dr

Yes. People are less likely to continue searching down a path ending in a negative reward. And when they do switch paths, they tend to switch to higher-value ones.

Our second main prediction is: (1B) People’s fixations should be preferentially directed towards high-reward states. Note that this is a general prediction consistent with both best-first search and pruning.

Probability of continuing a path by reward of last state on path¶

A simple indicator of reward-directed search is to look at the probability that people continue planning down a path as a function of the rewards revealed so far. Indeed, we see that people are more like to continue a rollout when the last-fixated state is rewarding.

Note that we don’t count going back to the previously fixated state as “continuing”, even though it is technically a valid next state (because the graph is undirected). It’s more likely that this reflects going back to the previous node in the decision tree.

Jumping decisions¶

The previous analysis focuses on the question of whether to continue a rollout. What do they do when they decide not to continue with the current path? Do they preferentially “jump” to states with higher expected value? Unfortunately, we can’t really define expected value in this task without a model (how do you handle unknown transitions?). For now, we assume the transitions are known, but account for which rewards have been seen.

We see that people do show a slight tendency to fixate states with higher value (compared to the average value of other states they could have fixated). It’s a small effect, but it’s statistically significant (t-test vs. 0: \(p < .001\)). Excluding the zero cases, relative value is positive 60% of the time.

Basic behavior¶

For completeness, here we look at some simple performance metrics, broken down by participant.

Relative score¶

Relative score is defined as (human - avg_random) / (optimal - avg_random). People usually get a perfect score (1.0), suggesting that the task might be too easy.

Number of states fixated¶

People fixate every state on about half of all trials. This presents a challenge for any analysis focusing on whether or not a state was fixated.

Total number of fixations¶

Fixation durations¶

Next up¶

continuance probability by full path value
- but how to define full path value?
- maybe the full expected value given all revealed information?
  - that’s actually really hard to determine because the transition function is not known!
  - I guess you could integrate across uncertainty in the transitions?
backwards vs forward
- undirected really kills us here
- people do tend to fixate the starting position first (not discussed above)
look for a scanning stage